1,146 research outputs found
Adaptive Clustering through Semidefinite Programming
We analyze the clustering problem through a flexible probabilistic model that
aims to identify an optimal partition on the sample X 1 , ..., X n. We perform
exact clustering with high probability using a convex semidefinite estimator
that interprets as a corrected, relaxed version of K-means. The estimator is
analyzed through a non-asymptotic framework and showed to be optimal or
near-optimal in recovering the partition. Furthermore, its performances are
shown to be adaptive to the problem's effective dimension, as well as to K the
unknown number of groups in this partition. We illustrate the method's
performances in comparison to other classical clustering algorithms with
numerical experiments on simulated data
Rankin-Cohen brackets on quasimodular forms
We give the algebra of quasimodular forms a collection of Rankin-Cohen
operators. These operators extend those defined by Cohen on modular forms and,
as for modular forms, the first of them provide a Lie structure on quasimodular
forms. They also satisfy a ``Leibniz rule'' for the usual derivation.
Rankin-Cohen operators are useful for proving arithmetic identities. In
particular we give an interpretation of the Chazy equation and explain why such
an equation has to exist.Comment: 17 page
Model Assisted Variable Clustering: Minimax-optimal Recovery and Algorithms
Model-based clustering defines population level clusters relative to a model
that embeds notions of similarity. Algorithms tailored to such models yield
estimated clusters with a clear statistical interpretation. We take this view
here and introduce the class of G-block covariance models as a background model
for variable clustering. In such models, two variables in a cluster are deemed
similar if they have similar associations will all other variables. This can
arise, for instance, when groups of variables are noise corrupted versions of
the same latent factor. We quantify the difficulty of clustering data generated
from a G-block covariance model in terms of cluster proximity, measured with
respect to two related, but different, cluster separation metrics. We derive
minimax cluster separation thresholds, which are the metric values below which
no algorithm can recover the model-defined clusters exactly, and show that they
are different for the two metrics. We therefore develop two algorithms, COD and
PECOK, tailored to G-block covariance models, and study their
minimax-optimality with respect to each metric. Of independent interest is the
fact that the analysis of the PECOK algorithm, which is based on a corrected
convex relaxation of the popular K-means algorithm, provides the first
statistical analysis of such algorithms for variable clustering. Additionally,
we contrast our methods with another popular clustering method, spectral
clustering, specialized to variable clustering, and show that ensuring exact
cluster recovery via this method requires clusters to have a higher separation,
relative to the minimax threshold. Extensive simulation studies, as well as our
data analyses, confirm the applicability of our approach.Comment: Maintext: 38 pages; supplementary information: 37 page
Enterprise Identity Management – Towards a Decision Support Framework Based on the Balanced Scorecard Approach
Enterprise Identity Management Systems (EIdMS) are an IT-based infrastructure that needs to be integrated into various business processes and related infrastructures. Assessment and preparation of decisions for the introduction need to take the costs, benefits, and the organizational settings into consideration. A variety of methods for the evaluation and decision support of new IT (e. g. EIdMS) are discussed in the literature – however, these are typically based on single dimensions (e. g. financial or technology aspects). This paper proposes a multidimensional decision support framework, based on the Balanced Scorecard concept. The presented approach introduces four perspectives and a related set of initial decision parameters to support decision making. The perspectives are (a) financial/monetary, (b) business processes, (c) supporting processes and (ICT) infrastructure and (d) information security, risks and compliance. Perspectives and adaptable sets of decision parameters also may serve as foundation for software-based decision support instruments
PersLay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures
Persistence diagrams, the most common descriptors of Topological Data
Analysis, encode topological properties of data and have already proved pivotal
in many different applications of data science. However, since the (metric)
space of persistence diagrams is not Hilbert, they end up being difficult
inputs for most Machine Learning techniques. To address this concern, several
vectorization methods have been put forward that embed persistence diagrams
into either finite-dimensional Euclidean space or (implicit) infinite
dimensional Hilbert space with kernels. In this work, we focus on persistence
diagrams built on top of graphs. Relying on extended persistence theory and the
so-called heat kernel signature, we show how graphs can be encoded by
(extended) persistence diagrams in a provably stable way. We then propose a
general and versatile framework for learning vectorizations of persistence
diagrams, which encompasses most of the vectorization techniques used in the
literature. We finally showcase the experimental strength of our setup by
achieving competitive scores on classification tasks on real-life graph
datasets
Optimal quantization of the mean measure and applications to statistical learning
This paper addresses the case where data come as point sets, or more
generally as discrete measures. Our motivation is twofold: first we intend to
approximate with a compactly supported measure the mean of the measure
generating process, that coincides with the intensity measure in the point
process framework, or with the expected persistence diagram in the framework of
persistence-based topological data analysis. To this aim we provide two
algorithms that we prove almost minimax optimal. Second we build from the
estimator of the mean measure a vectorization map, that sends every measure
into a finite-dimensional Euclidean space, and investigate its properties
through a clustering-oriented lens. In a nutshell, we show that in a mixture of
measure generating process, our technique yields a representation in
, for that guarantees a good clustering of
the data points with high probability. Interestingly, our results apply in the
framework of persistence-based shape classification via the ATOL procedure
described in \cite{Royer19}
Optimal quantization of the mean measure and application to clustering of measures
This paper addresses the case where data come as point sets, or more generally as discrete measures. Our motivation is twofold: first we intend to approximate with a compactly supported measure the mean of the measure generating process, that coincides with the intensity measure in the point process framework, or with the expected persistence diagram in the framework of persistence-based topological data analysis. To this aim we provide two algorithms that we prove almost minimax optimal. Second we build from the estimator of the mean measure a vectorization map, that sends every measure into a finite-dimensional Euclidean space, and investigate its properties through a clustering-oriented lens. In a nutshell, we show that in a mixture of measure generating process, our technique yields a representation in , for that guarantees a good clustering of the data points with high probability. Interestingly, our results apply in the framework of persistence-based shape classification via the ATOL procedure described in \cite{Royer19}
Les troubles du comportement, la compétence sociale et la pratique d’activités physiques chez les adolescents
Les problèmes de comportement constituent une préoccupation importante en milieu scolaire. Un des principaux moyens d’intervention utilisés est l’entraînement aux habiletés sociales. L’efficacité de ce type d’initiatives est modeste en ce qui a trait au transfert, au maintien et à la généralisation de l’apprentissage de nouveaux comportements. L’objectif de cette étude est de comparer les profils d’élèves, avec et sans troubles du comportement, autour de différentes variables associées aux habiletés sociales, à l’adaptation psychosociale, à la pratique d’activités physiques et à certaines habitudes de vie à la santé. En partant des résultats de l’analyse, les auteurs proposent de nouvelles pistes d’intervention dans le cadre de programmes d’entraînement aux habiletés sociales s’adressant aux élèves en difficulté de comportement.Behaviour problems are an important concern in the school milieu. One of the principal means of intervention is that of social skills training. These types of initiatives have resulted in only modest efficacy regarding the transfer, the maintenance and the generalization of learning new behaviours. The object of this study is to compare profiles of students with and without behaviour problems with regards to different variables associated with social skills, to psychosocial adaptation, to the practice of physical activities, and to certain healthy life habits. From an analysis of the results, the authors propose new directions for intervention within the frame of social skill training programs for students with behaviour difficulties.Los problemas de comportamiento representan una preocupación importante en el ámbito escolar. Uno de los principales medios de intervención utilizados es la incitación a las habilidades sociales. La eficacia de este tipo de iniciativas es poco significante referente a la transferencia, al mantenimiento y a la generalización del aprendizaje de comportamientos nuevos. Este estudio tiene por objetivo comparar los perfiles de alumnos, con y sin trastornos de comportamiento, en torno a distintas variables asociadas a las habilidades sociales, a la adaptación psicosocial, a la práctica de actividades físicas y a ciertos hábitos de vida relacionados con la salud. A partir de los resultados del análisis, los autores proponen nuevas pistas de intervención a través de programas de entrenamiento a las habilidades sociales, dirigidos a alumnos con problemas de comportamiento
Le fort et les poudrières du complexe militaire de l'Île Sainte-Hélène
Construit vers 1820, le complexe militaire de l'île Sainte-Hélène comprenait des remparts, un arsenal, des magasins, une petite et une grande poudrières, une caserne et divers bâtiments pour desservir la garnison. Notre recherche, basée sur des données archéologiques, historiques et architecturales, est consacrée à l'analyse des liens qu'ont entretenu deux fonctions du complexe, soit l'entreposage et la défense. Nous avons établi que le fort a été conçu en tenant compte principalement des besoins d'entreposage: c'est pourquoi il a été construit près du quai et à une altitude similaire à celui-ci. Une telle disposition du fort en contrebas du mont Wolf a réduit sa valeur défensive, l'ennemi pouvant débarquer sur la rive ouest et attaquer le fort à partir du mont. Par contre, nous avons constaté que les poudrières répondent aux normes et démontrent un bon agencement des besoins d'entreposage et de défense
- …